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rates of convergence of our estimators. 
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1. Introduction. Filaments are one-dimensional curves embedded in 
where d > 1. Filament estimation has important applications in many 
fields including astronomy, geology, and medicine. Our basic filament model 
is 



where / : [0, 1] — t- M'^. The unobserved variables C/i, . . . , C/„ are drawn from 
a distribution H on [0, 1] and ei, . . . , en are drawn from a mean zero noise 
distribution F. The goal is to estimate 



Later, we extend the model to include background clutter, other l^'s drawn 
uniformly from a compact set containing the filaments. See Figure 1. Esti- 
mating / is an example of one-dimensional manifold learning. It may also 
be regarded as a type of principal curve estimation. 

There is a plethora of available statistical methods that can, in principle, 
be used for estimating filaments. These include: principal curves (Hastie and 
Stuetzle (1989), Kegl et al. (2000), Sandilya and Kulkarni (2002), and Smola 
et al. (2001)); nonpar ametric, penalized, maximum likelihood (Tibshirani, 
1992); beamlets (Donoho et al. (2001), and Arias-Castro et al. (2006)); para- 
metric models (Stoica et al. (2007)); manifold learning techniques (Tenen- 
baum et al. (2000), Roweis and Saul (2000), and Huo and Chen (2002)); 
gradient based methods (Novikov et al. (2006), and Genovese et al. (2009) 
and methods from computational geometry (Dey (2006), Lee (1999), and 
Cheng et al. (2005)). 

In this paper, we make some connections between the statistical problem 
and some ideas from computational geometry. We propose new, simple, non- 
parametric estimators for Tj, and we find their rates of convergence. To the 
best of our knowledge, our methods are the first that are computationally 
simple, consistent, and have given rates of convergence with the exception of 
Cheng et al. (2005). However, our methods are simpler than those in Cheng 
et al. (2005), our assumptions are weaker, our loss function is more stringent 
and our estimators have faster rates of convergence. 

The optimal rates of convergence for this problem appear to be unknown. 
In related work (Genovese et al. (2010)) we derived the minimax rate under 
stringent conditions. In ongoing work, we are finding the minimax rate under 
more general conditions. These rates depends critically on various features 
of the noise distribution F. The methods in this paper are unlikely to be 
minimax optimal. Nonethless, they achieve reasonable rates of convergence 
and are simple to compute. 






r EE F/ = {/(n) : < M < 1}. 
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Fig 1. These plots illustrate the filament model. Top left: some points Ui on [0, 1] are 
mapped to Tf by f. Top right: noise ts added to the points. Bottom left: a larger sample. 
Bottom right: background clutter has been added. 

Our basic strategy involves two steps: 

1. Construct a set of fitted values that are close in Hausdorff distance to 
the filament. 

2. Extract a curve from this set of fitted values. 

Motivation. The need to identify filamentary structures arises in a wide 
variety of applications. In medical imaging, for instance, filaments arise as 
networks of blood vessels in tissue and need to be identified and mapped. 
In remote sensing, river systems and road networks are common filamentary 
structures of critical importance (Lacoste et al. (2005); Stoica et al. (2004)). 
In seismology, the concentration of earthquake epicenters traces the filamen- 
tary network of fault lines. Filaments are of particular interest in astronomy 
because the distribution of galaxies in the universe is concentrated on a net- 
work of filaments that is often called the "cosmic web." Indeed, astronomers 
have substantial literature on the problem of estimating filaments; see Luo 
and Vishniac (1995), van de Weygaert and Aragon-Calvo (2009), Martinez 
and Saar (2002), Barrow et al. (1985), Stoica et al. (2005), Eriksen et al. 
(2004), Novikov et al. (2006), Sousbie et al. (2006) and Stoica et al. (2007). 



FILAMENTS 



5 



Summary of Results. Two key geometric ideas underlie our results - the 
medial axis of a set and the thickness A(/) of a curve / - both of which 
are defined in Section 3. The medial axis is like the median of a set. The 
thickness of a curve measures both the curvature and how close the curve 
comes to being self-intersecting. 

Our main results are the following: 

1. If the noise level cr of -F is less than the thickness A(/), the filament 
equals the medial axis of the support of Y's distribution (Theorem 3). 

2. Any estimate of the boundary of the support of the distribution can 
be converted into an estimate of the filament that is close in Hausdorff 
distance to the true filament (Theorems 9 and 10). If the rate of con- 
vergence of the boundary estimator is r„ then the rate of convergence 
of the filament estimator is also r„. 

3. Our estimators produce a set of fitted values that contain the filament 
and are close to it in Hausdorff distance. In Section 5, we show how to 
extract curves from the set estimators that are Hausdorff close to the 
true filament. 

Proofs of all results are given in Section 6.1. 

Notation. The boundary of a set S is denoted by dS. The Hausdorff 
distance between two sets A and B is 

(3) dH{Ai,A2) = mm{6 : AiCA2®S and A2 C Ai ® S} 
where 

(4) Ae6=\jB{x,6) 

denotes the 6-enlargement of the set A, and B{x,6) = {y : \\y — x\\ < 5} 
denotes a closed ball centered at x with radius 5. If yl is a set and x is a point 
then we write d{x, A) = infy^A H^:^ — The closure of ^ is denoted by A and 
the complement of j4 by A^. A curve is a map / : [0, 1] R'^. Throughout, 
we use symbols like C, co,ci . . . to denote generic positive constants whose 
value may be different in different expressions. 

2. The Model. We will focus on finding filaments in a two dimensional 
point process although the ideas extend to higher dimensions. We begin with 
a single filament. Suppose we observe Yi, . . . ,Yn where 



(5) 



Y = f{Ui) + ei, i = l,...,n 
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where / : [0, 1] ^ M^, C/i, . . . , C/„ ~ ii" where H is a distribution on [0, 1] 
and ei, . . . , e„ are drawn from F. 

Denote the graph of the filament / by 

(6) T^Tf = {f{u): nG[0,l]}. 

With some abuse of terminology, we refer to both / and as the filament. 
We assume that is contained in a compact set which, without loss of 
generality, we take to be [—1, 1]^. 

The output of our algorithms will be a set T which need not be a curve. 
Our loss function is Hausdorff distance 

(7) dH{TfX)=ini^8 : P C T/ 5 and T/ C f ^j. 

We will also show how to extract a curve from F. 

Next we define a smoothness condition for /. For any three distinct points 
x, y, z on Fj let r(x, y, z) be the radius of the circle passing through the three 
points. Define the thickness of the curve Fj, (Gonzalez and Maddocks, 1999) 
denoted A = A(/), by 

(8) A = A(/) = A(Ff) = minr(x,y,z) 

x,y,z 

where the minimum is over all triples of distinct points on Tj. A is also 
called the minimum global radius of curvature, and the normal injectivity 
radius of / and the condition number (Niyogi et al. (2008)). The thickness A 
has the following interpretation: it is the minimum radius of all circles that 
are tangent to one point of Fj while passing through another point of Fj. 
A ball of radius r > A tangent to a point y on Tf can contain points in Fj 
other than y. This can occur because the radius of curvature of Fj is smaller 
than r or because the curve comes within r of self-intersecting. See Figure 2. 
Hence the thickness combines information about curvature and separation, 
capturing both local and global features of the curve. A useful way to think 
of A is that it is the largest radius of a ball that can roll freely around Tf. 

If /(O) / /(I) we say that / is open. If /(O) = /(I) we say that / 
is closed. If, for u,v £ (0,1), u v implies that /(n) 7^ f{v) then we 
say that / is simple, or non-self-intersecting. Otherwise, we say it is self- 
intersecting. Unless stated otherwise, we assume that / is smooth (non-zero, 
finite gradient at every point) and simple. We assume that the filament is 
parameterized with respect to arclength, normalized to [0, 1] . 
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Fig 2. A ball of radius r < A can roll freely (left). A hall of radius r > A cannot roll 
freely because either it hits a region of high curvature (center) or it hits a region with a 
near self-intersection (right). 

We make the following assumptions: 

(Al) H has density h with respect to Lebesgue measure on [0, 1] that is 
bounded and bounded away from zero: 

(9) < ci < inf < sup h{u) < C2 < oo 

0<«<1 0<u<l 

for some ci , C2 . 
(A2) The noise distribution F satisfies these conditions: 

1. F has support i?(0,(T). 

2. F has bounded continuous density (j) with respect to Lebesgue 
measure on and <j){y) > for all y in the interior of B{Q,a). 

3. (j) is nonincreasing, that is, ||u|| < \\v\\ implies that > 4'{v). 

4. (j) is symmetric, i.e. ||rE|| = \\y\\ implies that 4){x) = (j){y). 

5. There exists < /? < oo and C > such that 

(/)(x) ~ C(o" — ||x||)^ as )-(T. 

(A3) / is sufficienty smooth, i.e., a < A(/). If / is open, then also 
11/(1) -/(0)||/2>A(/). 

The parameter /3 controls the behavior of <j) near the boundary of its 
support. The marginal density of Yi is q(^y) = j 4>{y — f{u))dH(u). Let 

(10) S={y: q{y)>0} 
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denote the support of q. It follows from assumption (A2) that 

(11) 5= U B{f{u),a). 

0<M<1 

We will let Q = Qf,h,a denote the distribution of the data corresponding to 
density q. The boundary behavior of q is related to /3. Let 

(12) a = (3 + (1/2). 

Lemma 1 There exist constants ci,C2 > such that the following is true. 
Let y = {yi,y2) be in the interior of S. For small enough d{y,dS) we have 
that 

(13) cidiy,dSr<q{y)<C2d{y,dSr. 

We remark that if the noise density is uniform on B{0,a), then a = 1/2 
and so q is not uniform over its support. In fact, q{y) = on dS. 

Multiple filaments can be modeled by allowing / to be piecewise con- 
tinuous instead of continuous. Multiple filaments can also be represented 
as follows. Let fi, . . . , fk be a set of one dimensional curves in where 
fj : [0, 1] — )■ M^, j = 1, . . . , k. Let be a distribution on {1, ... , k} and let 
Hi, ... , Hk denote k different distributions on [0, 1]. For i = 1, . . . , n let 

Yi = fzM)+^^■ 

We can also extend the model to allow for clutter, as in Gasgupta and 
Raftery (1998). Let Qq denote a uniform distribution on a compact set 
C C and define the mixture {l — r])Qo + r]Qf^h^f^ where < ?] < 1. We call 
points drawn from Qq background clutter. Until Section 3.5, we will assume 
no clutter is present (i.e., t] = 1). Another generalization of the model is to 
allow / to be self-intersecting, which we consider briefly later. 

3. Estimation. It will be helpful to first make some connections with 
some concepts from computational geometry. 
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Fig 3. The Medial Axis. Top left: a set S. Top right: a non-medial hall contained m S; 
Bottom left: a medial ball that touches the boundary of S in 2 places. Bottom right: the 
medial axis consists of the centers of the medial balls. 

3.1. Some Backgound on Geometry. Let S" C be a compact set. A 
ball B d S \s called medial if 

1. interior (B) n 55 = and 

2. S n dS contains at least 2 points. 

The medial axis M = M{S), shown in Figure 3, is the closure of the set 

(14) G 5 : B{x,r) is medial for some r > O} . 

See Dey (2006) and references therein for more information about the prop- 
erties of the medial axis. 

For each u let N(u) denote the normal vector at f{u) and T{u) the tangent 
vector at f{u). Define the fiber, 

(15) L{u) = \^f{u) +tN{u) : -a <t <a^ 

and the tube T = Uo<m<i -^(^)- 

For open curves define the initial and final end caps, respectively, by 

(16) Co = B{f{0),a)-T and Ci = B(/(l), a) - T. 
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When / is a closed curve, the end caps are empty, and when / is open with 
11/(1) - /(0)|| > 2a, ConCi = 0. 

The next lemma gives a useful decomposition of the support set S. 

Lemma 2 1. S = TdCqDCi, and in particular, when f is closed, S = T. 

2. For every u v £ [0, 1], L{u) and L{v) are disjoint. 

3. For every y £ T, there exists a unique fiber containing y. 

4- For every y £ T, the closest point on dS to y is either f{u) + aN{u) 
or f{u) — aN(u). 

5. When f is closed dS = OSq U dSi, when f is open dT = OSq U dSi, 
where 

dSo = {/(n) + s{u)aN{u) : < n < 1} 

and 

dSi = {f{u) + t{u)aN{u) : < u < 1} 

are two non intersecting connected curves where s{u) € { — 1,+1} and 
t{u) = —s{u). 

The following theorem relates the filament to its medial axis. 
Theorem 3 1. If f is closed and a < A(/) then Tj = M{S). 

2. If f is open and a < A(/) then Tj C M{S). If, in addition, a < 
11/(1) -/(0)||/2 then Tf = M{S). 

This result holds both good news and bad news. The good news is that 
= M(S), relating the filament to a well defined geometric quantity. The 
bad news is that the medial axis is not continuous in Hausdorff distance. 



Fig 4. A stylized example showing that small perturbations m S can lead to large changes 
in M{S). The medial axis of a circle (left) is the center. If a small perturbation is added 
to the circle (right) then the medial axis changes completely. 

Small perturbations to S give a completely different medial axis, as il- 
lustrated in Figure 4. Thus, estimating the medial axis is non-trivial. Prom 
now on, we assume that a < A(/). 
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The Euclidean distance transform (EDT) (Breu et al. (1995)) is a map- 
ping from M? —7- [0, oo) defined by A(y) = d{y,dS). The next result gives 
another characterization of the filament Fj: the filament maximizes A(y). In 
particular, Tf = {y £ S : A{y) = a}. 

Lemma 4 1- y £ M{S) if and only if A(y) = a. 

2. For anyyeS- M{S), A{y) < a. 

3. For any y £ S, d{y, M{S)) + A(y) = a. 

Let S be an estimate of S and dS be an estimate of dS. For y £ M?, 
define the empirical EDT by A(y) = d{y, dS). We estimate the noise level a 
hya = sup^ggA(y) = A(y), where 

(17) y = argmax^ggA(y). 

Theorem 5 Suppose that dH{dS,dS) < e. Then: 

1- sup^gK2 |A(y) - A(y)| < e. 

2. \a-(T\< e. 

3. d{y,M{S)) < 2e. 

Following Cuevas and Rodriguez-Casal (2004), we say that a set S is 
(x, X) -standard if there exist positive numbers x and A such that 

(18) iy{B{y,e)nS)>x^{B{y,e)) for ah y G 5, < e < A 

where u is Lebesgue measure. We say that S is partly expandable if there 
exist r > and > 1 such that dH{dS,d{S e)) < Re for all < e < r. 
(Recall that S © e is the enlargement of 5). A standard set has no sharp 
peaks while a partly expandable set has not deep inlets. 

Lemma 6 S is standard with x = 1/4 and X = a. Also, S is partly expand- 
able with R = 1 and r = A — a. 

3.2. Estimating Boundaries. We estimate the support S and its bound- 
ary dS. The estimate of dS will be converted into an estimator of the 
filament. The performance of these estimators, in Hausdorff-distance loss, 
translates directly to the performance of the filament estimators. We use 
rn to denote the rate of convergence of the boundary estimator; that is, 
dH{dS,dS) = Opirn). 

In practice, we will use the estimator from Cuevas and Rodriguez-Casal 
(2004) and Devroye and Wise (1980), described in the following result. An 
example is shown in Figure 5. This estimator is simple to use and fast to 
compute. Recall that a = /? + (1/2) where /3 is defined in condition (A2). 
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Fig 5. These plots illustrate the estimators S and dS. Left: A closed filament, data and 
the true support. Center:The estimator of the support S is a union of balls. Right: The 
boundary estimator. 

Lemma 7 (Cuevas and Rodriguez-Casal (2004)). Let Yi,. . . ,Yn be a ran- 
dom sample from a distribution with support S. Let S be compact, (A,x)- 
standard and partly expandable. Suppose the distribution Q has positive den- 
sity q and that for all y & S, q{y) > Cd{y, dS)" for some C > and some 
a > 0. Let 

n 

(19) S = [jB{Yi,en) 

i=l 

and letdS be the boundary of S. IfC> Y^2/(xvr) and en = C{\ogn/nY/^'^'^"^ 
then, with probability one 

(20) dH{S,S)<rn and dH{dS,dS)<rn 

for all large n, where rn = C(logn/n)^^^'^~^°'\ Also, S C S almost surely for 
all large n. 

Proof Outline. The proof is essentially the same as the proof in Cuevas 
and Rodrfguez-Casal (2004). They implicitly assume that infy^s Qiu) > 0. 
In particular their proof (see page 348 of their paper) argues that, for any 
y G 5, Q{B{y, e)) > ce^ for some c > 0. This is true under standardness and 
assuming that infj^g5 q{y) > 0. However, we allow (7 to be at the boundary 
and only require q{y) > Cd{y, dS)"'. In this case, by applying Lemma 1, we 
have that Q{B(y,e)) > ce^"*"". The result then follows as in their proof by 
replacing with e^"*"". □ 

We will also need the following property of the union-of-balls estimator dS. 
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Lemma 8 Let Yi, . . . , Yn he a sample from Qf^a.h- If f is open and if S C S 
then dS is a simple, closed curve. If f is closed and if S G S then dS consists 
of two simple, closed curves dSo and dSi. 

3.3. From Boundaries to Filaments. We now give two estimators of Fj 
which we call the EDT estimator and the medial estimator. By condition 
(A3), 0- < A so that Ff = M{S). 

The first estimator is inspired by the fact that the Tf maximizes the EDT. 
The second estimator is inspired by the following fact. For a closed curve, 
dS consists of two disjoint pieces dSo and dSi and the medial axis is midway 
between dSo and dSi. 

The algorithm for the EDT estimator is as follows. An example is shown 
in Figure 6. 



The EDT Estimator 

Input: support and boundary estimates S and dS and a radius e > 0. 

Output: a set of fitted values F. 

Algorithm: 

1. Compute A(y) = d{y,dS), for all y & S. 

2. Set a = max^ggA(y). 

3. Let 6 = 2e and set f = {y e S : d{y, dS)>a- 6}. 



We remark that the choice 5 = 2e in the EDT procedure is mainly for 
theoretical purposes. In practice, 6 can be used as a tuning parameter. 

Theorem 9 Let f = ^y £ S : d{y,dS) >d-5^ he the EDT estimator, 
where 5 = 2e. 

1. If dnidS, dS) < e, then F C f C F (4e), and ^^(F/, f ) < 4e. 
2.IfS = ULiBiyu^n) where e„ = C(logn/n)i/(2+"), c > 72/0^ 
and X = 1/4, then, with probability one, 

(21) (F/,f ) = 0(r„) 

for all large n, where = I ) 
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Fig 6. These plots illustrate the EDT-based estimator. Left: filament and data. Center: 
Estimated boundary. Right: EDT estimator V. 

Now we consider the medial estimator. In this case, we estimate the fibers 
L{u) by joining points on opposite sides of the estimated boundary. The 
algorithm for constructing the medial estimator follows: 



The Medial Estimator 

Input: support and boundary estimates S and dS, where dS consists of 

two, disjoint curves dS^ and dSi. 
Output: a set of fitted values T. 

Algorithm: 

1. For each y £ OSq, let y be the closest point on dSi and 
let iy be the line segment connecting y and y. 

2. Set to be the midpoint of £y. 

3. Set f = : y G OSq}. 

We will focus on analyzing this algorithm for closed curves. The case of 
open curves is discussed in Section 6.2. 

Theorem 10 Let T be the medial estimator. Then: 

1. If dH{dSo,dSo) < e and dH{dSi,dSi) < e, with e < {A-a)/2, then 

(i) For every fi £ T there is a filament point f{u) G Tj such that 
WJi- f{u)\\<2e. 
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(ii) There exists C > such that, for each f[u) G there is Ji £ T 
such that 11/2 — < C^/e. 

(ill) dH{f,Tf) = 0{^e). 

2.IfS = \S=iB{Yi,€n) where e„ = C(logn/n)i/(2+")^ c > ^/2/{xtt) 
and X = 1/4, then, with probability one, for all large n, 

(22) dH{rf,f) = 0{^) 

where r^ = (logn/n)^/^^"'""). 

An example is in Figure 7. The medial estimator has a slower rate of 
convergence than the EDT estimator. However, Lemma 11 and Theorem 12 
below show that it is easy to extract a curve from the fitted values. The 
extracted curve has the faster rate r„ rather than -^/r^. 

Let r be the medial estimator and assume that / is closed. (The case 
where / is open is considered in Subsection 6.2.) The fitted values T are 
derived from the estimated boundary dS. These fitted values have gaps. 
All we have to do is connect the gaps with straight lines to get a curve. 
Surprisingly, this also improves the rate of convergence. Here are the details. 

Recall that, from Lemma 8, dS = OSq U dSi and that OSq is a closed 
simple curve. The medial estimator takes each point y £ dSo and outputs 
a fitted value //(y). Let g' be a parameterization of OSq, so dSo = {{g{u) : 
< n < 1}. Define f{u) = Jl{g{u)). 
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Lemma 11 The function f : [0,1] ^ M."^ is a union of open curves. In 
particular, there exist = ao <ai < •■■ < = 1 such that f is a 
continuous, open curve on each (aj,aj+i) but f is possibly discontinuous at 
each Uj. 

Now we define /* as follows. In general, f{a~) / We define /* to 

be the curve obtained by joining f{(ij) and f{(i^) by linear interpolation. 
Wc call r* = {f*{u) : < u < 1} the completed medial estimator. 

Theorem 12 /* is a simple, closed curve. Furthermore, dH(r f*,!^ f) = 
Op{rn). 

Multiple Filaments. Suppose now that there are finitely many filaments 
fi,...,fk. First suppose that dmin (F/^- , T/^, ) > 2a for all j ^ k where 

dinin{A, B) = mmx<=A,y<=B \ \x — y\\. The properties of S guarantee that for 
large enough n, S will consist of disjoint, connected sets Si, . . . , S^- 

Corollary 13 Suppose that a < min^ A{fj), where ^ifj) denotes the thick- 
ness of the curve fj, and that d^miXfj, Tj-j.) > 2(j for all j k. If the EDT 
or medial procedure is applied then 

where r„ is as before. 

When the condition dinin(r/^ , Fj-^.) > 2a fails, then the curves can get 
close to each other or even could be self-intersecting. In that case, we cannot 
claim to estimate the entire curve well. However, we can estimate the well- 
separated portions of the curves. Let F = Uj^iT/, . For each y G F let 
N{y) = {j : B{y, 2a) n Fj. ^ 0}. Let Fq = {y G F : \N{y)\ = 1}. 

Corollary 14 Suppose that a < min^ A(/j). // either the EDT or medial 
procedures are applied then 

dH{ro,f) = Op{rn) 

where rn = y^log n/n for the EDT estimator and rn = (logn/n)^/^ for the 
medial estimator. 

3.4. Extracting a curve from EDT estimator. Now we discuss how to 
extract a curve from the fitted values. We assume that we have already 
computed the union of balls estimator S with an appropriate choice of Cn 
and hence that dH{S, S) < Crn and dH{dS,dS) < Cvn for some C > 0. 
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Let r denote the fitted values from the EDT estimator. Our goal is to use 
r to find a curve / such dHiTpTf) < CdH{dS, dS). Such a curve / can be 
identified both for open and closed filaments. The precise statement is given 
in Theorem 15 below. 

More informally, recall first that from Theorem 9, F C F C T© (4e). Now, 
when / is an open filament, from Lemma 2, S = T U Cq U Ci. Thus, let 
yo £ r nCo and yi G L n Ci be points in F and the two end-caps of S. Any 
curve F J between yo and yi that lies entirely in F must cut through every 
fiber in 7" at a distance at most 4e and it is at most 4e from the end points 
/(O) and /(I). Hence diy(Fj,F/) < 4e. 

When, instead, / is a closed filament, let yo be a point in S'^ surrounded 
by OSq. Any closed curve F^ that lies entirely within F and has winding 
number 1 with respect to yo cuts through every fiber in 7" at a distance at 
most 4e from Tf. In this case too duiXp ^j) < At. 

The extraction algorithm is based on the remarks above. In the open 
filament case, because Co and Ci are unknown, we replace yo and yi by 
estimated end-points xq and x\ that maximize the minimum path length 
between two points in F, as illustrated later in Subsection 6.3. In the closed 
filament case we use a slightly different implementation, that generalizes 
more readily to the case where it is not known if the filament is open or 
closed. 



EDT Curve Extraction Algorithm 

Input: EDT Estimate F and corresponding e > 0, and constraint sets £q 
and £i. {£q = £i=W' by default). 

Output: the graph of a curve F. 

Algorithm (Open-Curve Case): 

1. Find end points xq and xi satisfying 

(23) Xo,xi= argmax min length(7r), 

«ern£-o,i'ern£-i''^'^"'" 

where Vu,v is the set of paths in F from u to v. In practice, this is 
accomplished by constructing a ^-net of points in F with < < e/4; 
forming the minimum spanning tree of this net; and finding the points 
that maximize the minimum path length in the tree. 

2. Join the end points by a curve in F. In practice, this is obtained from 
the minimum spanning via Dijkstra's algorithm (Dijkstra (1959)). 
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3. (Optional) Relax the path to thickness A as follows: for each successive 
triple of points yi+i) on the path, shrink yi as close to {ui+i + 

yj_i)/2 while remaining in T. Iterate until the reduction in thickness 
is below a fixed threshold. 

Algorithm (Closed-Curve Case): 

1. Fix < 7? «; e. 

2. Let y be the point defined in equation (17) that determines a. 

3. Let ^8 be the union of all line segments through y with end points on 
dT and whose length is < 8e. 

4. Define A = {As r\f) ® r] . 

5. Apply the open-curve algorithm to F — ^ with the constraint that 
the end points of the curve, xq and xi, must both lie on dA (i.e., set 
£o = £i = dA). 

6. Join xq and xi by a curve contained within A^ producing a single 
closed curve. 

Algorithm (General-Curve Case): 

1. Construct A as in the closed curve algorithm 

2. If F — ^ has one connected component, continue with the closed-curve 
algorithm. (This can, for instance, be determined using a friends-of- 
friends with a threshold distance of r] from the closed-curve algorithm.) 

3. Otherwise, F — ^ must have two connected components. Do the fol- 
lowing: 

(a) Apply the open-curve algorithm to each component with the con- 
straint that the one of the end points in each component must 
lie on the boundary of A (i.e., £o = and £i = dA for the first 
component and vice versa for the second). 

(b) Join the endpoints on the boundary of A with any path through 
A to create a single curve. 



For the open-curve case, specification of ^ is arbitrary. Smaller ^ give 
larger nets and lead more convoluted initial paths but allow more effective 
smoothing in the relaxation step. The minimum spanning tree end points 
can be refined by using the expected hitting times for a random walk on the 
^-net. Restricting the random walk to suitably small steps of order e gives a 
sparse transition matrix. The expected hitting time from one end point to 
all other points can be maximized to refine the other end point and so on. 
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alternating end points. This process tends to converge rather quickly and 
produces better results in practice. Relaxation is optional but must be used 
if a smooth F is desired. 

For the closed curve case, the choice of r] is again arbitrary, a non-zero 
value is needed to provide clean separation. The set A can be replaced in 
practice with the intersection of T and a ball of radius 6e around y, which 
is easier to compute, if somewhat more conservative. 

The following theorem shows that the algorithm produces curves with the 
desired properties. 

Theorem 15 Let T denote the curve extracted from the EDT estimator by 
the algorithm described above. Assume that dn^dS, dS) < e. Then, 

1. If f is closed, dH(r,Ff) < 4e. 

2. Iff is open, (i//(f,r/) < 16e. 

An example of curve extraction is shown in Figure 10. 

3.5. Decluttering. Assume now that Yi has density m[y) = (1 — ?/)(7o(y) + 
r]q{y) where qq is the uniform density over a compact set C and q is the 
density of points from the filament. We assume that S C C where S is the 
support of q. Thus, qo{x) = I{x G C)/V where V is the area of C. 

Let Zi = 1 if Yi is from q and Zi = if Yi is from qQ. To identify clutter, 
we want to find a classifier c{y) where c{Y) = 1 means that we guess that 
Z = 1 and c{Y) = means that we guess that Z = 0. 

The best classifier is the Bayes' rule. 



(24) c.(y) = l{F{Zi = l\Yi) > 1/2) = l(m(y) > 2(1 - rj) qo{y)) 



The Bayes rule is not identifiable. Since 1 — t/ < 1, a conservative approxi- 
mation to the Bayes rule is 



An estimate of c is c (y) = I{fh{y) > 2qo{y)) where m is a density estimator 
obtained from Yi, . . . ,Yn. In practice we use a kernel density estimator. We 
can now apply the previous filament algorithms to the decluttered data set 



where 



q{Yi) V 
m{Y,) 



(25) 




(26) 
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An investigation into the properties of this decluttering process is beyond the 
scope of this paper and wih be reported elsewhere. However, we will illustrate 
the decluttering procedure in the examples and show that it appears to 
perform well in practice. 

4. Examples. We have tested our procedures on a few simulated data- 
sets. We start by considering two smooth filaments, one open and the other 
closed. In the first example the two filaments are well separated (top left 
panel in Figure 9) while in the second dataset the two filaments intersect 
(top left panel in Figure 11). The third example considers 12 different smooth 
open filaments, with several intersections. 

Note that the condition on the radius of curvature fails to hold in presence 
of intersections between filaments, thus only the first dataset satisfies the 
conditions of this paper completely. 

In all the examples we have chosen according to the suggestion in 
Cuevas and Rodriguez-Casal (2004) as follows: 

(27) Cn = max min 1 1 1^ — Yj\\. 

l<i<n j^i 

The first two datasets contain 1500 points: 500 of which on each filament 
and 500 points of background clutter (top right panels in Figures 9 and 11). 

A summary of the results from the decluttering procedure is given in 
Figure 8 for both dataset. The procedure seems to work well in separating 
filament from clutter points. 
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Fig 8. Summary of decluttering on the first dataset (left) and second dataset (right). 



The filaments were estimated with the EDT and the Medial Estimator 
methods of subsection 3.3, applied to the decluttered datasets. The esti- 
mated filaments obtained for the first dataset are very close to the true 
(bottom panels in Figure 9). 

We applied the curve extraction procedure of subsection 3.4 to the EDT 
estimator shown in the bottom left panel of Figure 9. Figure 10 shows the 
extracted curves. 

The estimated filaments obtained for the first dataset are very close to 
the true (bottom panels in Figure 9). For the second dataset (bottom panels 




Fig 9. First example. Top line: true curves and the support of the distribution (left), the 
data (right). Center line: points identified as clutter (left), decluttered data (right). Bottom 
line: EDT estimator (left). Medial estimator (right). 
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Fig 10. First example. Curves extracted from the EDT estimator. Data with background 
clutter overlayed. 

in Figure 11) the medial estimator fails to detect the true filament near the 
intersection and becomes more and more accurate as it moves away from 
the intersection. Considering that the condition on the radius of curvature 
is violated, even in the second dataset the estimate seems to be quite satis- 
factory. 

The third dataset is more challenging as it contains 12 filaments, with 
several intersections. Eighty points were generated from each filament and 
350 more points were generated as background clutter, for a total of n = 1310 
data points (top panels in Figure 12). The decluttering procedure (central 
panels in Figure 12) resulted in 989 points marked as filament (34 of which 
were generated as clutter) and 321 points marked as clutter (5 of which 
were filament points). The estimates, obtained from the points marked as 
filament, are shown in the bottom panel of Figure 12. These estimates are 
accurate for filaments with no intersections. The accuracy is less satisfactory 
for intersecting filaments or for filaments that are too close to each other. 
This was to be expected, as the condition on the radius of curvature is not 
satisfied in these cases. 

5. Discussion. In recent work (Genovese et al. (2010)) we found the 
minimax rate for this problem under restrictive conditions (but in general 
dimensions). In current work, we are finding the minimax rates in general. 




Fig 11. Second example. Top line: true curves and the support of the distribution (left), 
the data (right). Center line: points identified as clutter (left), decluttered data (right). 
Bottom line: EDT estimator (left). Medial estimator (right). 
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Fig 12. Third example. Top line: true curves and the support of the distribution (left), 
the data (right). Canter line: points identified as clutter (left), decluttered data (right). 
Bottom line: EDT estimator (left). Medial estimator (right). 
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This is a difficult problem because the rate depends critically on features of 
the noise distribution F. Moreover, the problem is essentially a deconvolu- 
tion problem since the variables = f{Ui) are unobserved and corrupted 
by noise. We will report on these results elsewhere. 

The estimators presented here are not minimax but are appealing be- 
cause of their simplicity. Finding a practical estimator that achieves the 
minimax rate is an open question. Our approach, instead, consists of two 
steps: producing a set of fitted values T and then extracting a curve from 
r. We gave two specific methods for obtaining the fitted values and a curve 
extraction method for each of the two approaches. The resulting estimators 
have reasonably fast rates of convergence. 

The noise model is critical. We assumed compact support which is rea- 
sonable for many applications. Without compact support, the behavior of 
the methods changes substantially as it does in nonparametric measurement 
error problems. 

It is interesting to compare our results to those in Cheng et al. (2005). 
They show that each of their fitted values is Op((log n/n)^/^) from the 
filament. Under weaker conditions than they assumed, we get a rate which 
is faster as long as a is not too large. (They implicitly assume that a = 0.) 
Also, our rate is in Hausdorff distance which is a stronger notion of closeness 
than used in their paper. 

Currently, we are pursing several extensions of our results. These include: 
the aforemetioned extensions to higher dimensions (manifold learning), re- 
laxing the smoothness condition, relaxing the constant a condition, noise 
distributions with non-compact support and comparisons with beamlets. 
We are also investigating data-driven methods for choosing the tuning pa- 
rameter e and we are studying the theoretical properties of the decluttering 
technique. 
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6. Supplementary Material. 

6.1. Proofs. Proof of Lemma 1. For y G B{0, a) the density (f) satisfies 
(28) cP{y)>Ci-d{y,dBiO,a)f = C2 



1 



Note also that, monotonicity of <j) imphes that (j){y) < <^(0). 

Let d = d{y, dS) and let yo G dS be the point on dS closest to y = (yi, ?/2)- 
Without loss of generality, assume that yo = (0, 0)"^ and that the tangent 
vector to dS at yo is (1, 0)-^. We now prove that qivM > C4d{y,dSf+y\ 
In Lemma 2 we show that S = Uo<u<i ^i'^)^ where L{u) is defined in (15) 
as L{u) = {f{u) + tN{u) : —a < t < a} and N{u) is the normal vector 
at f{u). Moreover, we show that the L(n)'s are disjoint. Let u G [0, 1] such 
that y £ L{u), hence \\y — f{u)\\ = a — d. Continuity of / implies that there 
exists an interval («', u") C [0, 1] such that \ \y — f{u)\ \ < a — d/2 for all u in 
the interval. We will show later that \u' — u" I > Cs • We can write the 

density at y = (yi,y2) as q{y) = I{u:\\y-f{u)\\<a} ~ /(^))^(^) du and the 
conditional density 

I{u:\\y-f{u)\\<a} '^(^ " /(^))^(^) du 

hy2:iyuy2)es} I{u:\\y- nu)\\<a} 'i'iy - fiu))Ku) du dy2 • 
The denominator is bounded from above by (j){Qi)CQ. Hence, 



^(2/2 1 2/1 ) 



9(2/2 1 yi) 



> 



> 



> 



C7 



{u:\\y-f{u)\\<a} 



(t>{y- f{u))h{u) du>Cr 



\y-f{u)\\ 



h{u) du > Cg 



<P{y - f{u))h{u) du 
h{u) du 



ru" 


a -d/2 


lu' 


a 



2a 



C -u'\>C^-dP+^l\ 



Now we show that \u'-u"\ > C^\/d. Let z = d/2, \\f{u')-f{u")\\ is bounded 
below by the distance of the intersection of the two balls i?((0,0), A) and 
B{{A. + a — 2z, 0), (T — z). Some algebra shows that 



\\f{u')-f{u")\\>2^z 

Finally, since 

\\f{u')-f{u")\\- 



2Acj 
A + cr 



2^/d 



Aa 
A + a 



V/(n) du < sup \7f{u)\u' — u"\ 
Me[o,i] 
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we obtain 



u"\ > ^^2Vd\ = Cr^Vd. 



\u' ^ , ^_ 

sup„g[oj] V/(u) ' V A + (J 
It is easy to see that q{yi) > c > for all y G B{yo,e). Hence, q{y) > 

Now we find the upper bound. Let d = d{y,dS). Let uq be such that 
y G L{uo). Now 



Q{y) = Hy - f{u))h{u)du <C2 (piy - f{u))du 

Ju' Ju' 

< C24>{y — f{uo)) \u" — u\ < C2Cd^ \u" — u'\ 

Earlier we showed that \u" — u'\ > C^yfd. By a similar argument, |n" — n'| < 
c^\fd for some C5. The result follows. □ 

Proof of Lemma 2. 

1. First, consider the closed case. We show that S = T. Suppose not. 
Then there is a y E 5 such that 

y / f{u) + tN{u) 

for any u G [0,1] and t G [—a, a]. Let f{u) be the closest point on the 
curve to y. Since y ^ L{u), {y — f{u),T{u)) / 0. Without loss of generality, 
suppose that {y — f{u),T{u)) > 0. So, for sufficiently small e, 

\\y-fiu)\\^<\\y-f{u + e)\\^ = \\y-f{u)-eTiu)\\^ + o{e'') 
= \\y- + 6^ _ 2e{y - f{u),T{u)) + o{e^) 

<\\y-f{u)\\\ 

which is a contradiction. For the open case, the balls B{f{0),a) and B(f{l), a) 
do not intersect. Both balls are contained in S. The half plane formed by 
the normal vectors at /(O) and /(I) split these balls in two, with half of 
each in T and half in T'^. The result follows. 

2. Now we show that u ^ v G [0,1] implies that L{u) n L{v) = 0. Suppose 
that L{u) and L{v) intersect at some point y. So 

y = f{u) + sN{u) = f{v) + tN{v) 

for some s,t G [— a, cr]. Let be the ball of radius A tangent to f{u) and 
containing y. Let be the ball of radius A tangent to f{v) and containing 
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y. Note that f{v) ^ Au and f{u) ^ A^. (This fohows from the discussion 
after (8).) Now f{v) ^ A^ imphes that t > s but f{u) ^ A^ imphes that 
s > t and so s = t. So, y = f{u) + sN{u) = f{v) + sN{v). By the triangle 
inequahty, 

d(/(z;),center(^„)) = \\f{u) + AiV(n) - /(^;)|| 

< ||/(n)+AiV(n)-y|| + ||y-/(i;)|| 
= m-s)N{u)\\ + \\sN{v)\\ 
= A. 

But f{u)^A y means that center(j4u)) > A. So the inequahty above 

must be equahty which imphes that f{u) + AN{u), y and f{v) fall on a line. 
Hence, L{u) and L{v) cannot intersect. 

3. Follows directly from 2. because T is the union of the fibers. 

4- Follows from 1. and 5. follows from 4- D 

Proof of Theorem 3. 

1. First we show that Tj C M{S). Pick any u G [0, 1]. Let B = B{f{u),a). 
We claim that B n dS contains at least two points. Let a = f{u) + aN{u) 
and h = f{u) — aN{u). We will show that a and b are in i? n dS. 

Note that a,b £ B and hence they are in S. In fact they are boundary 
points because they are not in the interior of 5. To show this, suppose to 
the contrary that a is interior. Hence there exists v such that 110-/(^)11 = 
6 < a. That is, f{v) is in the interior of B(a,a). But this contradicts the 
assumption. The same argument shows that b G dS. Hence, f{u) G M{S) 
and so F/ C M{S). 

Now we show that M{S) C Tf. Let y G M{S). We claim that y = 
f{u) for some u. Suppose not. From Lemma 2, y G L'{u) for some u and 
y ^ L{v) for any v ^ u. Also, f{u) G M{S) and B{f{u),a) n dS contains 
a = f{u) + aN{u) and b = f{u) — aN{u). Since y G L{u) and y / f{u) 
either ||y — a|| < a or ||y — 6|| < a. Without loss of generality, assume that 
lly — a|| < fj. Set r = ||a — y|| and s = \\y — f{u)\\ and note that r + s = a. 
Let B = B(y,5) be the medial ball at y. li 5 > r then the interior of 
B(y,5) has nonempty intersection with dS. So 6 must be less than or equal 
to r. On the other hand, ii 6 < r then B{y, 6) D dS = 0. So we must have 
6 = r. But B{y,r) is stricty contained in B(f{u),a) except for the common 
point a. Thus, all points in B{y, r) are interior points of S except for a. So 
B(y,r) n dS contains fewer that 2 points and hence y cannot be in M(S). 

2. The proof that Fj C M{S) is the same as in part 1. Now suppose that 
11/(1) - /(0)|| > 2a. We will show that M{S) C Tf and hence Tf = M{S). 
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From Equations (15) and (16), recall that T, Co, and Ci denote the tube 
and the end caps. By Lemma 2, S = T U Co U Ci, where Co H Ci = 0. 
Let y £ M{S). If y G T then the proof of the previous part implies that 
y = f{u) for some u. That is, M{S) CiT C Tf. Now suppose y S Cq. Then 
d{y, dS) = r < a for some r. We may assume r > otherwise y is on 
the boundary and cannot be medial. Consider a ball B{y, 6). We claim that 
B{y,5) cannot be medial. In fact, if 5 < r then all points in B[y,5) are 
interior to 5. If 5 = r then B(y,6) intersects dS at a single point. Finally, 
if 5 > r then interior(i?(y, S)) H dS / 0. Thus B{y, 5) cannot be medial and 
Co n M{S) = 0. Similarly, Ci n M{S) = 0. Hence, M{S) = F/. □ 

Proof of Lemma 4. We prove the closed case. The open case is similar. 

1. li y £ AI{S) then y = f{u) for some u by Theorem 3. From lemma 2, 
the closest point on the boundary is either f{u) + aN{u) or f{u) — aN{u). 
In either case, d{y, dS) = a. 

2. We have y = f{u) + tN{u) for some u and t. Since y ^ M{S), it follows 
that y 7^ f{u) and so t / 0. Then, from (1), d{y,dS) < a. 

3. We have y = f{u) +tN{u) for some u and some t £ [— fi, a]. The closest 
boundary point is either f{u) + aN{u) or f{u) — aN{u). Without loss of 
generality, assume it is f{u) + aN{u). Hence, t > 0. So, a = d{f{u),dS) = 
d{f{u),y) + d{y, dS) = d{y, M(5)) + A{y). □ 

Proof of Theorem 5. 

1. Choose any y £ M?. Let z^: be the closest point to y on dS. Let z be 
the closest point to y on dS. Let z be the closest point to z^: on 95". Then 

My) = lly - ^11 < lly - ?ll < l|y - 2*11 + 11^* - S'll 

< l|y - 2*11 + e = ^(y) + e- 

Now let z be the point on dS closest to z. Then 

My) = \\y - z*\ \ < lly - 2|| < ||y - z|| + p - z|| 

< A(y) + e. 

2. Let y = argmax^^g A(y) and let be its closest point in M{S). Then 

3^ = My) < My) + e < A(y*) + e = + e. 

Also 

a = My) > A(y,) > A(y,) - e = a - e. 
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3. By Lemma 2, there is a unique u such that y is on the fiber L{u), 
centered at f{u). So 

a = d{fiu),dS) = \\f{u)-y\\+d{y,dS)>\\f{u)-y\\+diy,dS)-e 
= \\f{u)-y\\+d-e>\\f{u)-y\\+a-2e. 

Hence, d{y,M{S)) < d{y,f{u)) < 2e. □ 

Proof of Lemma 6. Let y be a point in S and let A(y) < a be its distance 
from the boundary dS. If A(y) > e then B{y, e) Ci S = B{y, e) so that 
uiBiy, e)nS) = u{B{y, e)) = vre^ > xiy{B{y, e)). 

Suppose that A{y) < e. Let f{u) be the point on the filament closest 
to y and let y* be the point on the segment joining y to f{u) such that 
\\y ~ y*\\ = The ball A = B{y*,e/2) is contained in both B{y,e) and 
S. Hence, z^(S(y, e) n S") > u{A) = ne'^/A = xv{B{y, e)). This is true for all 
e < cj, hence S is (x, A)-standard for x = 1/4 and \ = a. 

Now we show that S is expandable. By Proposition 1 in Cuevas and 
Rodriguez-Casal (2004) it suffices to show that a ball of radius r rolls freely 
outside S for some r, meaning that, for each y G dS, there is an a such that 
y G B{a,r) C S*^, where 5^ is the complement of S. Let O^^ be the ball of 
radius A — a tangent to y such that Oy C 5^. Such a ball exists by virtue 
of the conditions on o". □ 

Proof of Lemma 8. Suppose first that / is open. Then dS is a closed, 
simple curve. Let An = SU S. We will first show that dAn is a closed, non- 
self-intersecting curve for all n. Consider one observation Yi and note that 
Ai = SUB{Yi, en). Since Yi E S, interior(5(yi, e„)) n 5 / 0. It is then easy 
to see that dAi is a closed, non-self-intersecting curve. A simple induction 
argument verifies that dAn is a closed, non-self-intersecting curve for all n. 
Now, when S C S, we have An = S and the conclusion follows. The proof 
for closed curves is similar. □ 

Proof of Theorem 9. 

1. First we show that y € F implies that d{y,M) < 4e. Let y £ T. 
Then d{y, dS) > d{y, 95) - e > a - 2e - e > o- - e - 2e - e = a - 4e. So 
d{y, M) = a - d{y, dS) < a - a + = 4e. Now we show that M C f . 
Suppose that y E M. Then, 

d{y, dS) > d{y, dS)-e = a-e>d-2e = a-6 

so that y G F. 

2. The proof of the second statement follows from 1. and Lemma 7. □ 
The next two Lemmas 16 and 17 are needed to prove Theorem 10. 
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Fig 13. Diagram for proof of Lemma 16. 



Lemma 16 Let V and Ti he two curves in M? such that duiX:^!) ^ ^■ 
Given a point a G , let a* he the point onT closest to a. Let d = \ \a — a*\\. 
Consider a hall, with radius r > d + e and center a, that contains a, a* and 
no other points in T. Then there exists a point a G Ti such that \ \a — a\ \ = 
d{a,Ti) and 



(29) 



< 



4rd 



-,e + e' 



Thus, 



a\\ = 0{^e). 



Proof of Lemma 16. See Figure 13. Consider the ball A = B{a,r) with 
r > d + e. Let a be a point along the radius that joins a with a* . Since 
d^f(r, Fi) < e, there exists a point g £ T within e distance from a, other 
than a*. 

Note that a ^ B(a,r — e), otherwise g would be in B(a,r), but, by con- 
struction, a* is the only point that belongs to B{a,r) n F. To show that 
a ^ B{a,r — e), assume by contradiction that a were in B{a,r — e), then 
IIq — all < r — e and 



a 



g\ \ < — a|| + ||a — ffll < r — e + e, 



thus implying that g G B{a,r). 
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Now, since | |a — a|| < d + e, then a £ B{a,d + e)ri B{a, r — e)^, the shaded 
region in Figure 13. Thus, ||a — a*|| < 1 1 w — a* 1 1 , where is either one of the 
two points where the two balls B{a, r — e) and B{a, d + e) cross in Figure 13. 
Without loss of generality assume the system of coordinates is such that: 

a EE (0,0) a = {r-d,0) a* = (r,0), 

and the equation of the two balls are: 

B{a, d + e) : (x — (r 
B{a, r — e) : + y'^ 

Thus the coordinates of w are 

/ _ r + d 
I Xy, = r - 6, yw 

\ r — d 

and the distance between a and a* 

*li2 ^ II *ii2 / 

a — a \\ < \\w — a \\ = (x, 

□ 

Lemma 17 Suppose e < (A - (t)/2. Let y{u) = {y = f{u) + tN{u) : 
for some u G [0, 1] and |t| < cr + e} be the extended fiber. It can be shown 
that the extended fibers y{u) = {f{u) + tN{u) : —A <t< A} are disjoint. 
For a given u let y £ y{u). Let y* be the point of dSi (i = 0,1) closest to y. 
There exists yi G dSi such that \\y — yi\\ = d{y,dSi) and 

(30) M-M'<^-^e + e^ 

with < /3 < (A — cj)/2 — e. Hence \ \yi—yi\ \ = 0{^/e) uniformly overy{u). 

Proof of Lemma 17. Let r = {a + A)/2 and note that o" + e < r < A. 
Consider two balls of radius r tangent to the filament at f{u) on either side 
of Tf. Both balls contain no points of Fj other than f{u). Let be the 
center of the ball on the side opposite to y* , so that Oi is on the normal 
through f{u). 

Now we show that the balls B{ai,r + a), i = 0,1 centered in Oi satisfy the 
conditions required in Lemma 16. By construction, B{ai,r + a) is tangent 
to dSi at y* and y G B{ai, r + a). The center ai of B{ai,r + a) is on the 
normal through f{u), thus y* is the closest point to Oi on the boundary dSi 



d))^+y^ = id + e)^ 
-{r-e)\ 



r - e)2 - x2 I , 



w-r)"^ + yl 



^rd 
r — d 



e + e~. 
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and there are no other points in dSi interior to the ball. Also dSi cannot be 
tangent to the ball in a point z ^ y*, otherwise ai would be on the extended 
fiber y{u') for some u' ^ u. But the extended fibers y{u) = {f{u) + tN{u) : 
—A < t < A} are disjoint. This shows that B{ai,r + a) is the ball A of 
Lemma 16, with a = ai, y = a and y* = a* . Hence, from Lemma 16: 



A(r + a)d n 

< ^ — ^e + e 

r + a — a 



where d = \\y — y*\\ < 2a + e. The result follows since r + a < 2 A, d < 
2ct + e < 2A and r + (J - d > (A - cj)/2 - e > ^. □ 




z(y) 



Fig 14. First illustration for the proof of Theorem 10. 

Proof of Theorem 10. See Figures 14 and 15. 

1. Recall that a = maXy^^ d{y, dS) < a + e and that for each f{u) £ Tf 
we have d{f{u),dSo) = d{f{u),dSi) = a. 

(i) Let /U G r, and let y E dSo and z{y) G dSi be the points that generated 
it, as in Figure 14. Let i{y, z{y)) be the line segment that joins y to z{y). The 
distance between any x G £{y, z{y)) and the boundary curves is, respectively, 
d{x, dSi) = \ \x — z{y)\ \ and d{x, dSo) < | |x— y| |. The midpoint /2 on £{y, z{y)) 
is such that — z{y)\\ < a and | |/I — y| | < d. Consider the point f{u) G Fj 
at the intersection between Fj and i{y,z{y)). 

To show that — < 2e, and f{u) belongs to the ball B(jl,2e) 

in Figure 14, suppose to the contrary that — /u|| > 2e, then either 
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\\fiu)-y\\ <d-2eoi ||/(n) - z(y)|| < a - 2e. But if ||/(n)-y|| < a - 2e, 
then 

d{f{u),dSQ)<d{f{u),dSa) + e<\\f{u)-y\\+€<d-2e + e<a 

which contadicts the fact that d{f{u), dSo) = cr. If, instead ||/('u) — z{y)\\ < 
a — 2e, then 

difiu), dSi) < difiu),dSi) + e = Wfiu) - ziy)\\ + e<a-2e + e<a 
that contradicts the fact that d{f{u),dSi) = a. 




Fig 15. Second illustration for the proof of Theorem 10. 



(a) Let f{u) S Tj, and let y* and z* be its closest points on dSo and dSi 
respectively, as in Figure 15. By construction, f{u) is on the midpoint of the 
segment ^(y*, z*), hence f{u) = {y* + z*)/2. Consider a point y G dSo such 
that II?/* — 2/11 < e. Let z{y) and h be the projections of y on dSi and dSi 
respectively. The midpoint fi = {y + z{y))/2 belongs to T. From Lemma 17, 
\\z{y) — h\\ < Cl{^/e) uniformly. Moreover, from Lemma 19 that follows 
below, we have ||/i — 2:*|| < uniformly. Hence: 

\\z{y)-z*\\ < \\z{y)-h\\ + \\h-z*\\<Ci{V~e) + C2e<C3V~e. 
It follows that 

X y* + z* y + z{y) ||y*-y|| + ||y*-y|| C1 + C3 ^ 

\\fiu)-m= — ^ ^ — < ^ < — ^ Ve- 
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(in) is a consequence of (i) and (ii). 

2. The second statement follows from statement 1. and Lemma 7. □ 

Some terminology and the next Lemma 18 are needed for stating and 
proving Lemma 19. 

Now we examine the two disjoint curves dSi, z = 0, 1 that constitute the 
boundary dS when is closed, and the set dT when Lj is open. For each 
boundary curve dSi we can distinguish two sides: one side that faces towards 
T and a second side that faces away from T f. Each point x G dSi supports 
two tangent balls that contain no other points of dSi^ one on each side. 

Analogously to the definition of thickness of a curve in Section 2, we 
define the Outer Thickness of the boundary dSi to be the minimum radius 
of curvature tq of all the balls tangent to one point of dSi on the side facing 
away from T f. We also define the Outer Critical Ball Ox to be the ball facing 
away from T f and tangent to any point x G dSi, with radius ro- Similarly, 
we define the Inner Thickness of the boundary dSi to be the minimum radius 
of curvature r/ of all the balls tangent to one point of dSi on the side facing 
towards Fj, and the Inner Critical Ball Ix to be the ball facing towards Fj 
and tangent to any point x S dSi, with radius r/. Both balls can roll freely 
on the side of dSi where they are constructed, but not necessarily on the 
other side. The thickness of the boundary curves is A{dSi) = minjro, rj}. 

Lemma 18 For every point y E dSi the outer critical ball Oy has radius 
ro = A — a and the inner critical ball ly has radius rj = A + a. Thus the 
thickness of dSi, i = 0,1 is A{dSi) = A — a. 

Proof of Lemma 18. We start with the inner ball. Let y be a point on 
the boundary dSi. Hence, y = f{u) + aN{u) say. Let A = B{c,a + A) 
where c = f{u) — AN{u). We claim that if x is any other point on dSi 
then x ^ A. Let i be the line segment connecting x to c. We wil show that 
the length of £ is strictly larger than a + A. Now x = f{v) + aN{v) for 
some V. The line i crosses Fj at some point f{t). The closest point on Fj 
to X is f{v) and the distance from x to f{v) is a. Hence, ||x — > a. 

Let A' = B{c,A). Then f{t) ^ A' and hence - c|| > A. Therefore, 

||x — c|| = \ \x — f{t)\ \ + \ \f{t) — c\\ > cr + A as required. The proof for the 
outer ball is similar. □ 

Lemma 19 Let y G dSi, and let y' be a second point in M?, such that 
\\y — y'W <A — a. Denote by z and z' , respectively, the projections of y and 
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y' on dSi-i, then the distance between z and z' is 

^1 >\ II /|i ^ o ^ + ^11 'I 

d{z,z) = \ \z - z \ \ < 2 ^_^ \\y - y I 



Proof of Lemma 19. If | |y' — | < | |y' — y 1 1 then 

II 'I I ^ 1 1 'Mill' 'I I ^ nl I 'I I ^ nl I ' II 

ll-z — -2||<||-z — y|| + ||y — z W ^ ~ y W ^ A\y ~ y\\- 

If instead \\y' — z\\ > \\y' — y\\ (see Figure 16), let c be the center of the 
outer critical ball Oy = B{c,A — a), and let 6 be the angle ycy'. Consider 
the triangle with vertices in y, c and y'. Since ||c — y|| = A — a, from the law 
of sines applied to 6 and to the angle facing £(c, y) 

■ a ■ ( — /^ ~ y'll ^ ~ y'W 

smd = sm [cyy j — rr — — ^ • 

||c — y|| A — a 

Now we show that the point z' , projected from y' onto dSi-i, lies in the 
shaded region of Figure 16. In fact, the inner critical ball Iz, tangent to z 
is such that z' ^ i3(c, A + a) = Iz- Moreover, since z' is the closest point to 
y', it follows that \\y' — z'\\ < \\y' — z\\. Thus z' G B{y',\\y' — z\\), and so 
z' G B{y' , \ \y' — z\\)ri B{c, A + cr)^, the shaded region in Figure 16. The two 
balls B(y' , \ \y' — z\\) and B{c, A -\- a) intercept in the two points z and w. 

Also, in the following Lemma 20, we show that ||?/ — < A — a implies 
that w is the farthest point from z in the shaded region. 

The angle zcw = 29 and the length of the chord i{z,w) is ||z — = 
2(A + o-)sin6'. Thus 

\\z - z'W < \ \z -w\\ = 2(A + o-)sin6l < 2{A + a) ^^^~ ^ 

A — a 

□ 

Lemma 20 If \ \y — y'\\ < A — a then w is the farthest point from z in the 
shaded area of Figure 16. 

Proof of Lemma 20. See Figure 16. While keeping the angle 6 fixed, and 
as long as ||y — < A — a, one can move the location of y' along the 
radius of i?(c, A + a) from its center c through y' . Let h = {z + w)/2 be 
the midpoint between z and w. If y' is chosen on the segment £(c, /i), then 
repeating the proof of Lemma 19 generates the same point w as in Figure 
16, and w is still the point in the shaded region farthest away from z. 

If, instead, y' is chosen along the line from h onwards, then, by construc- 
tion, there are points in the shaded region that have distance from z larger 
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\b(c, a + a) 



Fig 16. Illustration for the proof of Lemma 19 and Lemma 20. 

than \ \z — w\\. But such y' will violate the condition ||y — < A — o". In 
fact, assume without loss of generality that the system of coordinates is such 
that: 

c=(0,0), y=(A-a,0), z=(A + ct,0), 
and the coordinates of w and h are 

w= ( (A+cr)cos2 6l, (A+a)sin2 6' ); h= ( ^ ^ ^ (1+cos 2 g), ^ ^ ^ sin 2 6* 



By construction, if y' lies along the line from h onwards, then ||y — > 
\\y — h\. This implies that ||y — > ||y — and 

A + (t\^ /A-3o-\^ /A + cr\ /A-3cr 



> 

= (A -a 
□ 



2 

2 
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Proof of Lemma 11. This follows from the fact that dSo and dSi are each 
closed simple curves and that each consists of finitely many arcs of a circle. 
□ 

Proof of Theorem 12. The fact that /* is a simple closed curve is straight- 
forward. We have already shown that each fitted value is within distance e„ 
of M{S). It is easy to see that this is true of the linear completion as well. 
We still need to show that for each y G M{S) there is a fitted value with 
distance 0(en). 

Choose any y = f{u) £ M{S). The fiber L{u) divides S into two disjoint 
sets. Let yi be the fitted value closest to y from the first set and let y2 be the 
fitted value closest to y from the second set. Let £ = {ayi + (1 — a)y2 '■ < 
a < 1}. Let yl be the projection of yi onto M{S) and 1/2 be the projection 
of y2 onto M{S). Let £ be the line connecting yl and 2/2 • Since the endpoints 
of £ and £ are 0(e„) apart, it follows that (!}{{£,£) = 0(en). 

There are two balls Bi and B2 of radius A passing through y^ and 7/2- 
The arc of the curve Tf from yl and 2/2 is contained in the lens A = Si ni?2- 
(If not, then a ball of radius A could not roll freely.) So d{y,£) < d(y,dA). 
But d{y, dA) is simply the distance from the chord of a circle to the circle, 
where the chord has length 0(^y/e). It follows that d{y,dA) = 0(e). Finally, 
d{f{u),M*)<d{y,£) + dH{£,£) = 0{e). □ 

Proof of Theorem 15. For the open-curve case with Sq = £1 = , claim 
1. follows directly from Theorem 25. For the open-curve case with £q = dA 
and £1 =M?, as used in the general variant of the algorithm, note that the 
endpoint in dA is a distance < 8e from UndA, where U is the corresponding 
component of P. Moreover, every point in A lies within 8e of ACiM. Therefore 
1. also follows from Theorem 25. 

For the closed case, all that must be proved is that the estimated curve F 
is a closed curve that lies within F and that has (absolute) winding number 
1 around a point yo in the inner component of S^. Notice also that in the 
closed case, both the closed and general variants of the algorithm produce 
the same curve. 

First, recall that d{y,M) < 2e and notice that the unique fiber through 
y, L(uq), intersects with F in a line segment of length < 8e. This portion 
of a fiber is thus contained in the set As (defined in EDT curve extraction 
algorithm for the closed curve case) and thus in A; in fact, the fibers L(u)r]T 
for u in an open set containing uq are also contained in F — ^ is thus cut 
at a fiber and contains a single connected component because e <C A(/). (If 
the latter were false, two separated parts of / would lie within 8e < A of 
each other.) 
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Second, applying the open curve algorith with Sq = £i = dA produces a 
curve within T that connects one side of A to the other. (If the latter were 
false, the curve between the endpoints would have length < 8e, but a longer 
minimum path length can be obtained by winding around yo- The winding 
number cannot be greater than 1 because the curve in F — ^ is not closed.) 
A path between these end points that is contained in A closes this curve 
and keeps it within F. The resulting closed curve thus lies within F and has 
(absolute) winding number 1 with respect to i/q. Claim 2. follows. □ 

Lemma 21 Suppose S is a compact, connected set in M?. Then, 

1. If y G and x £ S and if L is the line segment from x to y, then 

LndS^iD 

2. Fix r > 0. // B{x, r) n S / but B{x, r)ndS = 0, then B{x, r) C S. 
Proof. 

1. Define = ini {d{x,w) : w G LCiS'^}. We know that the infimum 
exists because L is a compact set and that there is a unique point z G L for 
which d{x, z) = d-t. It follows directly that every neighborhood of z contains 
a point in S and a point in S^, so z G dS. 

2. Suppose the conclusion does not hold; that is, there exists a y G 
B(x,r) n S'^. By assumption, there is an z G B{x,r) n S. Apply Result 
1 in the lemma to the line segment between z and y, which is contained in 
B{x,r) by convexity. This implies that B{x,r) D dS ^ 0, contradicting the 
initial supposition. The result follows. □ 

6.2. Open Curves. This subsection deals with two issues related to open 
curves. First, the EDT curve extraction algorithm (Subsection 3.4) for open 
curves required that we estimate the endpoints of the curve. Second, for 
constructing the medial estimator in the open curve case, the estimated 
boundary dS needs to be split into two pieces dSo and dSi. Both issues are 
addressed here after the following discussion of some basic properies of open 
curves. 

Let / be an open curve and let dS the boundary support. Define 
Eo{c) = B{f{0),a + c)ndS 

and 

Ei{c) = B{f{l),a + c)ndS 
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to be the extended end caps of the support's boundary. Let, Eq = Eo{0) and 
El = Ei{0). Define 

Vo{a) = Eo\J{U{f{u)±aN{u): < u < a}) 



Let ip denote the arclength of /. Because / is parameterized by arclength 
normalized to [0,1], it follows that the gradient /' of the filament is such 
that = f for all u. Define 



Theorem 22 Let c > 0. 

1. Eo{ce) C Vo{a^/e) and Ei{ce) C Vi{a^/e). 

2. Let b = aip{l + a/A). Then Vo{a^) C Eq ® h^e and ^1(0^^) ^ 
El e b^/e. 

3. Let d = If a. Then Vo{ay/e) C Eo{d^/e) and Viia^/e) C Ei{d^/e). 
Proof. (See Figure 17). 

1. Let y G Eo{ce). We will show that y E Vo{a^/e). Note that y cannot belong 
to Eq hence, being y E dS, necessarily y = f{u) ± aN{u) for some u. We 
only need to prove that u < a\fe. Moreover, since for all u 



proving that ||/(0) — /(ti)|| < (/^a-^/e would be sufficient for the claim. 

We know that \ \y — /(0)|| = u + ce and that y = f{u) + aN{u) for some u. 
The line from f{u) to y defines the direction of the normal at f{u). Extend 
the normal at f{u) to the point z = f{u) + AN{u), so that \\z — f{u)\\ = A 
and \\z — /(0)|| > A, hence z must lie outside the circle C3 = B{f{0), A). 
Let A be the intersection between C3 and the segment from y to z; such 
intersection exixts because y E C3 and z C3. We have 



A-/(0)||=A, \\A- f{u)\\= A-h and \\A - y\\ = A - a - h, 



(31) 



Vi{a) 



Ei[J{U{f{u)±aN{u) : 1 - a < u < 1}) . 



(32) 




\\f{0)-f{u)\\>ipu, 



where h is positive. 
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Fig 17. Diagram for Theorem 22. 

Consider the triangle with vertices A, /(O) and y and denote by 6 the 
angle at y. The cosine theorem gives 

11^ -/(0)||2 = 11^ + 11/(0) -y||2-2||A-y||. 11/(0) -y||cos^ 

so that 

= {A-a-hf + {a + cef - 2{A - a - h){a + ce) cos 9 

and 

(A - a - /i)2 + (a + ce)2 - A2 

cos y = — r . 

2{A - a - h){a + ce) 

Now consider the triangle with vertices /(O), f{u) and y, where the angle 
at y is vr — ^. From the cosine theorem we obtain 

ll/(0)-/(n)||2 = ||/(0)-y||2 + ||y-/(n)||2-2||/(0)-y|H|y-/(n)||cos(7r-e). 
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And, since cos(7r — 9) = — cos 6 

11/(0) -/HIP = {a + cef + + 2{a + ce)acost 



(a + ceV + a^ + a- 



(A-a-h) 
{a + cef + cr^ + o-(A - a - h) + a 



(A-CT-Zi) ■ 
Note that for small e (as long as A^ > (cr + ce)^) 

EE (a + ce)2 + ^2 + a(A - a - /i) + t'^^t^^^^-^ 

(A — (7 — ftj 

is a decreasing function of h and then for all /i > 

e(c^e + 2ac)A 
< s 0) = . 

A — (T 

As a consequence 

11/(0) -/(«)|p<5*^^5^^ 

A — fj 

and 

11/(0) - /(n)|| < ^e^^^^ = V~e^a{c,e). 

2. Let y = /(u) + f7A^(ti) e Fo(a\/e)- Let yo = /(O) + cjiV(0). Then yo G -E^o 
and 

||2/-yo|| < ||/(n)-/(0)||+cT||iV(n)-Ar(0)|| < ^n+^ < a^e^ (l + ^ 
where we used Theorem l(iii) of Walther (1997), namely, 

l|A-M-ivw||< ll^'">-^WII . 

3. Let y = f{u) + o-iV(u) G VQ^a^e). Then 

||y-/(0)|| < ||/(n)-/(0)||+(T<¥.n + a<a + (/.a^/e. 

□ 
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6.3. Estimating the Endpoints. In this subsection we derive estimators 
for /(O) and /(I). First we will need some lemmas. Let T be the EDT 
estimator. 

Lemma 23 For fixed e > 0, the set T has the following properties. Suppose 
u gT and Tu is the intersection ofT and the fiber of S containing u. 

1. If f is closed, then Tu is a connected line segment through u. 

2. If f is open and u lies at least 2e from /(O) and /(I) then Fu is a 
connected line segment. 

Proof. Without loss of generality, we can assume that the fiber through u is 
oriented vertically and that u = (0, A). F must lie above the circle of radius 

A centered on the origin, and thus the boundary of the support (on that 
side of F) must lie above the circle of radius A + a centered on the origin. 
It follows that the outer portion of dS must lie above the circle of radius 
A + cr — e centered on the origin. 

First, consider the point x = (0, y) where yA + he for < /i < 4 on the 
fiber through u. Let d = a — 6 = a — ce for some c € [—1, 3] and r = A + a — e. 
And let z = {r sin 6, r cos 6). We want to find the maximum |^| such that 
\\y — z\\ < d; this will show limit the range of closest points. 

We have that 

||x - = sin^6' + {y - r cos^)^ = r^ + - 2rycos9. 
Taking > + — 2ry cos 6 yields 

(r - + 2ry - d'^ d'^ - {r - yf {a - cef - (a - e - hef 

cost/ > = i = i — — ; ; 

2ry 2ry (A + cr - e)(A + /le) 



Hence, 

(33) |l-cos(6l)|< 
(34) 



(1 + /i^ + c^)e^ + 2a{l + h-c)e 



(A + a-e)(A + /ie) 



A 



A A + cr-e 



{1 + h^ + c'^)e + 2a{l + h - c) 



A + he 



and thus | sin(0)| < 2^^^. 

Second, consider a wedge of half angle 6 > around the vertical axis. 
Consider points xq = (0, yo) and xi = (0, yi) where yo = A + hge and 
yi = A + hie with < hi < ho. Let z = {r sin (p, r cos 99) for r > A + cr — e 
and \{p\ < 9. We want to find the value of 6 such that ||a;o — z\\ < \\xi — z\\ for 
all such z. In this wedge, in other words, distance to the estimated boundary 
is monotone along the filament. 
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We have 

II l|2 2 • 2 I / \2 2 , 2 o 

— z|| =r sm (p + [yi — r cos (p) =r + — zryi cos (p. 
Hence, ||xo — -^H < \\xi — z\\ requires that 

2r cos ip{ho - hi)e > {ho - /ii)e(2A + {ho + hi)e) 
or equivalently 

A+ ^^2±^e 
cos if > . 



This is satisfied whenever cos(p > 112 or equivalently when \ip\ < 7r/3. 

Combining these two parts, we see that the closest point to the boundary 
must lie within a wedge of angular extend 0(-y/e/A), which is contained in 
the wedge for which distance to the estimated boundary is monotone along 
the filament. Claim 1. follows. For open curves, claim 2. follows from the 
same argument for a point u for which the fiber through u is sufficiently far 
from the endcaps. □ 

Lemma 24 F has a finite piece-wise (two continuous derivatives) bound- 
ary. 

Proof. Because dV = dV^, it is sufficient to show that F*^ has a piecewise 
smooth boundary. Since F consists of all points x G S such that d{x, dS) > 
an — S = w for some constant (5 > 0, it follows that 

(35) f^ = ^'U( U B{z,w)\. 

Thus, dV^ consists of the points in S that are exactly w away from dS. 
Because dS is a finite union of circular arcs of radius e, it follows that dV^ 
(= 9F) is the boundary of a finite union of ^-enlargements of circular arcs. 
A set that is a finite union of sets with piecewise smooth boundaries itself 
must have a piecewise smooth boundary. Thus, it is sufficient to show that 
the ii;-cnlargcmcnt of a single circular arc has a piecewise smooth boundary. 

To do this, let ^ be a circular arc, which we can take without loss of 
generality to be of the form 

.A = {(rcost,r sint) : ^0,^0]}, 



for ^0 € [OjTi")- Let x+ = r(cos^O) sin^o) and x_ = r(cos — sin^o) be the 
two endpoints. Let v{x) denote the point(s) in A that is (are) closest to 
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X G M?. For X in the cone A_x_ + A+x+ for A_, A+ > 0, ^(2;) = rx/||x||. 
For X on the negative horizontal axis, v{x) contains the two endpoints of 
the arc. For all other x, v{x) contains the endpoint of the arc on the same 
side of the horizontal It follows that the set of points x for which 

d{x,v{x)) = w IS a union of three circular arcs: one in the cone consisting 
of points at radius r + w, one for x_ consisting of part of the circle around 
X-, and one for x+ consisting of part of the circle around x+. This proves 
the lemma. □ 

Theorem 25 Let f be an open curve. Let Vu,v denote the set of paths be- 
tween u,v £ T that are contained in T. Define x,y T by 

(36) x,y = argmax min length(7r), 

where length denotes the arclength of the path. 
Then, 

(37) d^^({x,y},{/(0),/(l)})<16e 

The two quantities x,y defined in equation (36) are the estimates of the 
endpoints. 

Proof. Suppose dn {{x, y}, {/(O), /(I)}) > 16e. Then, either x or y must 
be farther than 16e from /(O) or /(I). Suppose without loss of generality 
that 

min length(7r) < min length(7r). 

That is, we are labeling the two points so that x is "paired" with /(O) and 
y is "paired" with /(I). Assume that \x — /(0)| > 16e; we show that a 
contradiction follows. 

Because F C F © (4e), it follows that x lies on one of the fibers through 
F. (That is, it lies in the "body" of F, not in the "caps," whose points are 
all < e from /(O).) Call this fiber J^. Let vr be the shortest path from x to 
y, and let vr' be the shortest path from /(O) to y. By the assignment of x 
and y above, it follows that vr' must pass through the fiber Let x' be the 
point that vr' passes through on and define i{z) to be the length of the 
shortest path from z £T to y. 

Again because F C F © (4e), it follows that has length < 8e. Because 
/ is an open curve, it follows immediately that F is simply connected. We 
claim that £{x') — i{x) < ||x — To see this, let ttq be the shortest path 
within F from x to y and i{x) be its length. Consider the following path 
joining x' to y: start at x' , move linearly to x along the fiber, then follow ttq. 
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From the previous lemma, this path is entirely within F. Since the length of 
this path is ||x — + i{x), we have i{x') < i(x) + ||x — and 

i{x') — £{x) < I |x — x'l I . 

Now invert the roles of x and x' and get 

i{x) — l{x ) < ||x — x'll 

so that the claim follows. 
Now, 

11/(0) - x'll > 11/(0) - x|| - ||x - x'll > 16e - 8e = 8e. 

It follows that ^(/(O)) > 8e + £{x') > 8e + i{x)-8e > i{x) which contradicts 
the assumption that \\x — /(0)|| > 16e. Applying this same argument to y 
and /(I) shows by contradiction that ||y — /(1)|| < 16e. This proves the 
theorem. □ 

6.4. Estimating the Boundaries. Now we consider estimating SSq and 
dSi . The estimators are defined in Theorem 28 but we need some preliminary 
results first. Let dS be an estimate of dS such that dHidS,dS) < e and let 
xq and xi be the endpoint estimators from Theorem 25, that are such that 

po-/(0)|| <Ce, Pi-/(1)|| <Ce. 

Define 

Bo = B{xo,a + ce) Bi = B{xi,a + ce) 
Eo = dSnBo Ei = dSnBi 

Recall the definitions of Vq, Vi and a given in (31) and (32). 

Theorem 26 Suppose that, dnidS^dS) < e, po - /(0)|| < Ce, pi - 
/(I) 1 1 < Ce and that dS is connected. Assume that c > C + 1. Let a = 
a(2 + c + C, e). Let Vo = Vb(a^) and Vi = Vi{a^fe). Then: 

dH{Vo,Eo) < h^e and duiyx^E^) < h^e 

where b = a{ip + a/A). 

Proof. Let x G Eq. Thus p — xo|| < a + ce. There exists x G dS such 
that p — x|| < e. Now, 

P-/(0)|| < P-S|| + P-xo|| + Po-/(0)|| 

< e + {a + ce) + Ce 

< e + {a + {c+l)e) + Ce 
= a + (2 + c + C)e. 
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Thus X G B{f{0),a + (2 + c + C)e) ndS £ Vo, by Theorem 22, and 

Now let X £ Vq. There exists z G B{f{0),a + ce)ndS such that — z|| < 
b^/e. There is a z G dS such that ||^ — < e. Now, 

\\z-xo\\ < P-z|| + ||z-/(0)|| + ||/(0)-xo|| 

< e + a + Ce 

< + (C + l)e < a + ce. 

Therefore x € Eq and so Vq C Eq® (b^). □ 



There is no guarantee that £^0 and Ei are connected sets. But this is 
crucial if we want to use them for the medial estimation procedure. Define 
the completion of Eq denoted by [Eq] to be the smallest connected subset 
of dS containing Eq. That is, 

[^0] = ^|c7 : C is connected, C cdS, EqCC^. 

Define [Ei] similarly. Finally, define 

R = dS-{[EQ]U[Ei]). 

Now by construction, [Eq] and [Ei] are connected. If they are disjoint, it 
follows that R consists of two connected components. To make sure that the 
completion procedure successfully combines elements of Eq without adding 
other elements, we need the following. 

Theorem 27 Under the assumptions of Theorem 26: 

max I |x — y 1 1 <2a + 2ce, and max | |a; — y| | < 2a + 2ce. 

x,y&Eo x,y&Ei 

//^< 11/(1) -/(0)||/5 then min^^g^^^^^g^[[x-y[[>2d + 2ce. 

Proof. For any X, y G £'0, we have I y| I < xo||+||y— xqH <2a+2ce. 
Now let X £ Eq and y £ Ei. Now 

11^0 — < ll^^o — 3;| I + ll^ — y|| + I |y — 
< 2d + 2ce + [[x -y[[. 
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Hence, 

I |5;o — ^i| I — 2a — 2ce 

11/(0) - /(1)|| - Po - /(0)|| - Pi - /(1)|| -2a- 2c6 
||/(0)-/(l)||-2Ce-2a-2c6 
11/(0) -/(1)|| - 25 -2(C + c)e 
So- - 25 - 2(C + c)e 
3o- + 2cj - 25 - 2(C + c)e 
3o- + 25 - 2e + 25 - 2(C + c)e 
3cT - 2(C + c - 2)e > 25 + 2ce. 

□ 

Combining the above results we have the following. 

Theorem 28 Suppose that, dnidS^dS) < e, po - /(0)|| < Ce, pi - 
/(I) 1 1 ^ C'e <ind that dS is connected. Assume that c > C+1. Let a = a(c, e). 
Let Vo = Vo{a^/^) and Vi = Vi{a^/e). If a < ||/(1) - /(0)||/5 then: 

1. djiiVo, [Eo]) < ciVi and dniVi, [Ei]) < ci£e. _ 

2. R consists of two connected components, OSq and dSi, say. 

3. dH{dSo,dSo) < C2^e and duidS^Mx) < C2^/^. 

Thus, statement 2. of the above theorem defines the estimators dSo and 
dSi. 

Proof. Parts 1 and 2 follow easily. Let us turn to 3. Let y = f{u) + 
aN{u) G BSq where < n < 1. First suppose that Ay/e < u < 1 — Ay/e 
where A = a(2 + c + C,e). Then y ^ S(/(0), a + (2 + c + C)e). There exists 
y G dS such that | |y — y| | < e. So 

a + (2 + c + C)e < ||y - /(0)|| < ||y - y|| + Hy- xo|| + Po - /(0)|| 
< e + ||y - xoll + Ce 

and so 

||y-£o|| > (T + (2 + c + C)e - e - Ce 
= (T + (1 + c)e > 5 + ce. 

Thus, y ^ Eq. a similar argument shows that y ^ E\ and y ^ dS\. Hence 
y G BSq. Now suppose that < tt < A^. From Lemma 30 



X — y| I > 
> 
> 

> 

> 



||/(n)-/(ylVi)ll <ciV^ 
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for some ci. From the first part of tlie proof, there is a y ^ Eq such that 
\\f{A^e)-y\\<e. But \\f{u)-y\\ <Jf{u)-f{A^e)\\ + \\f{A^e)-y\\ = 
e + Cl^/e < C2A/e, say. Hence, dSo C dSo ® C2^/e■ 

Now let y be in OSq. Hence, ||y — xo|| > ? + ce. Let y G c^S" be such that 

||y — y|| < £• Now 

CT + ce < ||y-xo|| 

< ||y-y|| + ||y-/(0)|| + ||/(0)-xo|| 

< e + ||y-/(0)|| + Ce 

and so 

I \y - /(0)| I > a + {C - I - C)e > a + {c - I - C)€ = a + je 

where 7 = c - 1 - C. It follows that y ^ {Vo{'y^/e) U Vl{'y^/e)). That is, 
y = f{u) + aN{u) with a(7,e)-y/e < n < (1 — a(7,e)-y/e). Arguing as above, 
using Lemma 30, there is a w such that a(c, e)^/e < v < {1 — a{c, €)^/e) and 
such that ||(/(f) + aN{v)) — {f{u) + aN{u))\\ < cs-^/e for some C3. Hence, 
dSo C ^So(Bc3^/e. A similar argument applies to dSi and ^S*!. The theorem 
follows by taking C4 = max{c2,C3}. □ 

Theorem 29 Let OSq and dSi be the estimators described in statement 
2. of Theorem 28. Let T be the medial estimator derived from OSq and dSi . 
Then the results of Theorem 10 hold. 

Proof of Theorem 29. Follows by combining the last four results. □ 
Lemma 30 (Niyogi et al. (2008)) If\\f{u) - f{v)\\ < f then 

"(«.»)-%^ < < < A-Ay/lTlEMZ2MI^ 
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